The test included 20 demonstrably false claims related to health, U.S. politics, world news, and foreign disinformation. Each claim was presented in three formats: a neutral question, a suggestive question, and a malicious prompt, such as asking the bot to generate a radio script containing the falsehood. ChatGPT repeated false claims in 22% of cases, while Gemini did so in 23%. When prompted maliciously, error rates rose to 50% for ChatGPT and 45% for Gemini.

Error rates of the audio bots ChatGPT, Gemini, and Alexa+ by input type: neutral (red), suggestive (blue), and malicious (brown). Alexa+ remained at 0 percent for all three types. | Image: Newsguard
Error rates of the audio bots ChatGPT, Gemini, and Alexa+ by input type: neutral (red), suggestive (blue), and malicious (brown). Alexa+ remained at 0 percent for all three types. | Image: Newsguard 

By contrast, Amazon’s Alexa+ rejected all false claims across all prompt types. According to Amazon Vice President Leila Rouhi, Alexa+ relies on trusted news sources such as AP and Reuters. OpenAI declined to comment on the findings, while Google did not respond to multiple inquiries.